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Abstract 

Background: Pregnancy associated glycoproteins form a diverse family of glycoproteins that are variably expressed 
at different stages of gestation. They are probably involved in immunosuppression of the dam against the feto- 
maternal placentome. The presence of the products of binucleate cells in maternal circulation has also been 
correlated with placentogenesis and placental re-modeling. The exact structure and function of the gene product is 
unknown due to limitations on obtaining purified pregnancy associated glycoprotein preparations. 

Results: Our study describes an in silico derived 3D model for bubaline pregnancy associated glycoprotein 2. 
Structure-activity features of the protein were characterized, and functional studies predict bubaline pregnancy 
associated glycoprotein 2 as an inducible, extra-cellular, non-essential, N-glycosylated, aspartic pro-endopeptidase 
that is involved in down-regulation of complement pathway and immunity during pregnancy. The protein is also 
predicted to be involved in nutritional processes, and apoptotic processes underlying fetal morphogenesis and re- 
modeling of feto-maternal tissues. 

Conclusion: The structural and functional annotation of buPAGI shall allow the designing of mutants and inhibitors 
for dissection of the exact physiological role of the protein. 
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Background 

Pregnancy associated glycoproteins (PAGs) were first iso- 
lated in 1982 by Butler and co-workers from the outer epi- 
thelial cell layer (chorion/ trophectoderm) of the bovine 
feto-maternal membranes where they are secreted by bi- 
nucleate cells [1,2]. Subsequently, PAGs have been isolated 
from several other species like sheep, goat, buffalo, cat, pig 
and horse. Presently, more than 100 PAG genes are known 
in ruminants, forming a very diverse family of glycoproteins 
that are variably expressed at different stages of gestation, 
starting about 7 th day post-fertilization onwards, largely in 
the pre-placental trophoblast, and post-implantation troph- 
ectoderm [3]. Also known as pregnancy specific protein-B 
(PSPB) or pregnancy specific protein (PSP)-60, these are pu- 
tatively known to act as immunosuppressants that allow the 
immunological acceptance of the embryo by the dam. The 
presence of the products of binucleate cells in maternal 
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circulation has also been correlated with placentogenesis 
and placental re-modeling [4]. However, the exact structure 
and function of the gene product remains largely undeter- 
mined; limitations on obtaining purified PAG preparations 
being the major bottleneck. PAGs show high sequence 
homology as a group, and also to aspartic proteases viz 
pepsin, cathepsin and chymosin. Given the availability of 
3D structures of these homologous proteins, the prediction 
of PAG structure from its amino acid sequence at high con- 
fidence levels is implicit. 

In the absence of experimentally determined protein 
structures, a homology-based model may serve as a good 
starting point for investigation of sequence-structure- 
function relationships. Although homology-modeled 
structures may often not be accurate enough to allow 
characterization of protein-protein or protein-inhibitor 
interactions at the atomic level, they can suggest which 
sequence regions or individual amino acids are essential 
functional components of the protein. Our study 
describes the first 3D model for a PAG, using bubaline 
PAG2 (fePAG2) as a candidate, obtained through a 
combination of several in silico modeling approaches. In 
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addition, primary and secondary structure analysis and 
functional annotation studies were also performed. 

Methods 

Sequence retrieval and analysis 

The amino acid sequence of buVAG2 [GenBank: 
AD06779L1] was retrieved from GenBank database at 
NCBI [5]. ProtParam [6] was used to predict physiochem- 
ical properties. The parameters computed by ProtParam 



included the molecular weight, theoretical pi, amino acid 
composition, atomic composition, extinction coefficient, 
estimated half-life, instability index, aliphatic index, and 
grand average of hydropathy (GRAVY). 

3D modeling of 6uPAG2 

A PSI-BLAST (Position Specific Iterated-Basic Local 
Alignment Search Tool) [7] search with default parameters 
was performed against the Protein Data Bank (PDB) to 



a 




Figure 1 3D models of 6uPAG2. A bi-lobed structure, typical of eukaryotic aspartyl proteases, is evident, a. Ribbon model of buPAG2; helices are 
depicted in blue and sheets in red. b. Molecular surface model of bu?KG2 colored by ConSurf implementation in YASARA View. 
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find a suitable template for homology modeling. The tem- 
plate, hence identified, was used for homology modeling 
using the modeling package MODELLER9vlO [8]. 



Protein structure accession number 

The final 3D structure of buVAG2 was submitted to the 
Protein Model Database (PMDB) [18]. 



Model optimization, quality assessment and visualization 

Hydrogen addition, and clash reduction was performed 
in Swiss-Pdb Viewer 4.0.4 [9]. Energy minimization was 
also performed with in vacuo GROMOS96 43B1 para- 
meters set using GROMOS96 implementation in Swiss- 
Pdb Viewer [10]. The errors in the model were, further, 
fixed using the tools at What IF Web Interface [11]. For 
structural evaluation and stereo-chemical analyses, the 
3D model was submitted to PDBsum [12]. Overall qual- 
ity of the structure was determined by ERR AT [13]. 
Visualization of 3D structures, and superposition, align- 
ment and RMSD determination of query and template 
structure were performed in YASARA View [14]. For 
structural alignment, MUSTANG implementation [15] of 
YASARA View was used. 

The glycosylation sites were predicted by using NetO- 
Glyc, NetNGlyc and YinOYang tools, and signal peptide 
was predicted by SignalP tool, provided by Centre for 
Biological Sequence Analysis, Technical University of 
Denmark (CBS DTU) [16,17]. 



Functional annotation of 6uPAG2 

BuVAG2 was analyzed for the presence of conserved 
domains based on sequence similarity search with close 
orthologous family members. For this purpose, three dif- 
ferent bioinformatics tools and databases including Inter- 
ProScan [19], Proteins Families Database (Pfam) [20], and 
NCBI Conserved Domains Database (NCBI-CDD) [21] 
were used. InterProScan is a tool that combines different 
protein signature recognition methods native to the Inter- 
Pro member databases into one resource with look up of 
corresponding InterPro and GO annotation. Pfam is a pro- 
tein family database, including their annotations and mul- 
tiple sequence alignments generated using hidden Markov 
models. NCBI-CDD is a protein annotation resource con- 
sisting of a collection of well- annotated multiple sequence 
alignment models for ancient domains and full-length pro- 
teins. Additionally, queries were submitted to ProKnow 
[22] and Kihara Protein Function Prediction (PFP) [23] 
servers for functional annotation of buVAG2. 

Essential proteins of a cellular organism are necessary 
for survival; information about essentiality of PAG was 
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Figure 2 Predicted secondary structure of 6uPAG2. 15 helices and 7 sheets are present; 2 disulfide linkages are also predicted (Generated 
from PDBsum). 
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retrieved from the Database of Essential Genes (DEG) 
[24]. E-value cut-off of 10" 10 and a minimum bit score of 
100 were used to scan buVAG2 against all essential pro- 
teins listed in DEG using BlastP. To check the involve- 
ment of PAG into metabolic pathways, KEGG automatic 
annotation server (KAAS) was used [25]. 

Results and discussion 

The present study focused on sequence, structural and 
functional analysis of PAGs using buVAG2 as a model. 
ProtParam was used to analyze different physiochemical 
properties from the amino acid sequence. The 367 amino 
acids long buVAG2 was predicted to have a molecular 
weight of 40804.7 Daltons and an isoelectric point (pi) of 
6.34. An isoelectric point close to 7 indicates a slightly 
negatively charged protein, and an instability index of 
49.21 suggests an unstable protein. The negative GRAVY 
index of -0.015 is indicative of a hydrophilic and soluble 
protein. 

Homology modeling of 6uPAG2 

The 3D model of a protein provides us invaluable 
insights into the structural basis of its function. Hom- 
ology or comparative modeling is the most common 
structure prediction method. Numerous online servers 



and tools are available for homology modeling of pro- 
teins. Upon a PSI-BLAST search against the Protein Data 
Bank (PDB), 3PSG_A was identified as the best template 
available for the homology modeling of the buVAG2 with 
47.59% sequence identity to buVAG2 over 96% query 
coverage. 3PSG_A is a refined X-ray diffraction model of 
A-chain of porcine pepsinogen at a resolution of 1.65 A. 
The query sequence and template structure were then 
provided as inputs in MODELLER9vlO to generate the 
3D model of bu?AG2. 

Energy minimization, quality assessment and visualization 

The model generated by MODELLER was subject to en- 
ergy minimization and assessed for both geometric and 
energy aspects using Swiss-Pdb Viewer and refined using 
What If Web Interface. The final model (Figure 1) 
showed a quality factor of 83.143% in ERRAT. The posi- 
tioning of secondary structural elements was generated 
from PDBsum. In all, the predicted model of buVAG2 
was found to contain 7 sheets, 9 beta hairpins, 2 psi 
loops, 6 beta bulges, 26 strands, 15 helices, 4 helix-helix 
interactions, 32 beta turns, 6 gamma turns and 2 disul- 
phide linkages (Figure 2). 

Several structure assessment methods including Rama- 
chandran plots and RMSD were used to check the 
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Figure 3 Ramachandran plot for the predicted model of buPAG2. 1 outlier (Glu 210) is present. 5 residues (lie 13, Glu 39, lie 52, Ser 237 and 

Lys 289) are present in the generously allowed region. All other 361 residues are in the allowed regions (Generated from PDBsum). 
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reliability of the predicted 3D model Ramachandran plots 
were also obtained from PDBsum for quality assessment. 
Only 1 (0.3%) of the total 367 residues were present in the 
disallowed region whereas another 5 residues were present 
in the generously allowed regions (Figure 3). G-factors 
provide a measure of how unusual a stereo-chemical prop- 
erty is. Values below -0.5 represent unusual property 
where as, values below -1.0 represent high unusualness. 
The G -factors for dihedral angles and main chain covalent 
forces were calculated to be -0.37 and 0.14, respectively. 
The overall average G-factor for the buVAG2 model was 
-0.16. The Ramachandran plot and G-factors indicate that 
the backbone dihedral angles, phi and psi, in the 3D model 
of buVAG2 are well within acceptable limits. 

The Root Mean Square Deviation (RMSD) indicates 
the degree to which two 3D structures are similar; the 
lower the value, the more similar the structures. Both 
template and query structures were superimposed for 
the calculation of RMSD (Figure 4). The RMSD value 
obtained from superimposition of buVAG2 and 3PSG_A, 
using MUSTANG in YASARA View, was found to be 
0.447 A over a total of 353 aligned residues. The overall 
quality factor, Ramachandran plot characteristics, G-fac- 
tors and RMSD values confirm the quality of the hom- 
ology model of fePAG2. The final protein structure was 
deposited in PMDB [PMDB: PM0077895]. 



The glycosylation sites were predicted by using NetO- 
Glyc, NetNGlyc and YinOYang tools provided by CBS 
DTU (Figure 5). NetOGlyc could not detect any O-glyco- 
sylation sites; NetNGlyc predicted N-glycosylation sites at 
residues 48, 68, 251 and 340. One N-glycosylation was also 
predicted with low confidence at position 245. YinOYang 
predicted 5 0-(beta)-GlcNAc sites at residues 112, 113, 
234, 236 and 296. Four other sites were also predicted with 
low confidence at residues 97, 98, 106 and 302. Of the total 
9 sites, residues 112 and 302 were also predicted as Yin- 
Yang sites. Yin-Yang sites are Ser/ Thr residues that are O- 
(beta)-GlcNAcylated as well as phosphorylated; these are 
reversibly and dynamically modified by O-GlcNAc or 
Phosphate groups at different times. Butler et al. also 
recorded large disparity in the glycosylation pattern of 
PAGs [1]. SignalP recognized the first 12 residues in the 
sequence as a signal peptide for extracellular secretion of 
the protein (Figure 6). 

Functional annotation of buPAG2 

Presently, PAGs are known to be pregnancy induced 
proteins expressed about 7 th day post-fertilization on- 
wards largely in the pre-placental trophoblast, and post- 
implantation trophectoderm. In the present study, a sys- 
tematic workflow consisting of several bioinformatics 
tools and databases was defined and used with the goal 




Figure 4 Structural alignment of buPAG2 with the template 3PSG_A. Structural alignment of predicted model of bi/PAG2 (blue) with the 
template 3PSG_A (cyan) is shown. RMSD value of 0.447 A was found over 353 aligned residues (Calculated with MUSTANG implementation in 
YASARA View). 
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Figure 5 Prediction of glycosylation sites in 6uPAG2. a. 4 N-glycosylation sites were predicted by NetNGIyc 1.0. b. 5 0-(beta)-GlcNac sites and 
2 Yin-Yang sites were predicted by YinOYang 1.2. No O-glycosylation sites were predicted. 



of performing structural and functional annotation of 
buPAG2. Three web tools were used to search the con- 
served domains and potential function of buPAG2. Based 
on consensus predictions made by Pfam, NCBI-CDD 
and InterProScan, it is confirmed that buPAG2 belongs 
to the aspartate protease superfamily and possesses 
eukaryotic aspartyl protease domain. Aspartic proteases 
are a family of protease enzymes that use an aspartate 
residue for catalysis of their peptide substrates. In gen- 
eral, they have two highly-conserved aspartates in the ac- 
tive site and are optimally active at acidic pH. 

Eukaryotic aspartic proteases include pepsins, cathepsins, 
and renins. They have a two-domain structure, arising 
from ancestral duplication. Each domain contributes a 
catalytic Asp residue, with an extended active site cleft 
localized between the two lobes of the molecule. One 
lobe has probably evolved from the other through a 
gene duplication event in the distant past. In modern- 
day enzymes, although the three-dimensional structures 
are very similar, the amino acid sequences are more di- 
vergent, except for the catalytic site motif, which is 
much conserved. The presence and position of disulfide 



bridges are other conserved features of aspartic pepti- 
dases [26,27]. 

Most eukaryotic endopeptidases are synthesized with 
signal and propeptides. The animal pepsin-like endopep- 
tidase propeptides form a distinct family of propeptides, 
which contain a conserved motif approximately 30 resi- 
dues long. The propeptide contains two helices that 
block the active site cleft; in particular, the conserved 
Asp residue in the protease hydrogen bonds to a con- 
served Arg residue in the propeptide. This hydrogen 
bond stabilizes the propeptide conformation and is prob- 
ably responsible for triggering the conversion of the 
zymogen to active enzyme under acidic conditions 
[26,27]. 

In our structure, Pfam recognized a 29 amino-acid long, 
propeptide sequence from residues 13 to 41 and a 304 
amino-acid long, eukaryotic aspartyl protease sequence 
from 64 to 367 residues. Active sites of the protease were 
recognized at positions 83 and 264. The first 12 resi- 
dues were recognized by SignalP as the signal peptide. 
The propeptide was recognized as a member of the Al 
Propeptide family (PF07966), whereas the aspartyl 
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Figure 6 Prediction of signal peptide in buPAG2. Residues 1-12 are predicted as the signal peptide and cleavage site is predicted between 
residues 12 and 13 by SignalP 4.0. Residues 13-41 were predicted by Pfam as the propeptide. 



protease was recognized as a member of the Asp family 
(PF00026) under the peptidase clan AA (CL0129). 
These predictions are similar to those of InterProScan 
that recognized peptidase activity within residues ran- 
ging from 71-91, 212-225, 261-272 and 346-361. The 
catalytic sites for the protease were predicted by Inter- 
ProScan within residue range from 54-219 and 225-367; 
active sites were predicted to be present within residue 
range from 80-91 and 261-272. The peptidase clan A A 
(CL0129) contains aspartic peptidases, including the pep- 
sins and retropepsins. These enzymes contain a catalytic 
dyad composed of two aspartates. In the retropepsins one 
is provided by each copy of a homodimeric protein, 
whereas in the pepsin-like peptidases these aspartates 
come from a single protein composed of two duplicated 
domains. This clan contains the 12 member families, viz. 
Asp, Asp protease, Asp protease 2, DUF1758, gag-asp pro- 
tease, Peptidase A2B, Peptidase A2E, Peptidase A3, RVP, 
RVP 2, Spuma A9PTase and Zn protease [26-28]. 

NCBI-CDD could also recognize Al Propeptide 
(cl06833); and cellular and retroviral pepsin-like protease 
(clll403) superfamily sequences within buVAG2. This 
superfamily is further classified as the peptidase family Al 
(pepsin A) and A2 (retropepsin family). Specifically, the 
alignment of fePAG2was detected with the superfamily 
member cd05478, i.e. Pepsin A. The cellular pepsin and 
pepsin-like enzymes are twice as long as their retroviral 



counterparts. These are found in mammals, plants, fungi 
and bacteria. These well known and extensively character- 
ized enzymes include pepsins, chymosin, rennin, cathe- 
psins, and fungal aspartic proteases. They contain two 
domains possessing similar topological features. The isl- 
and C-terminal domains, although structurally related by a 
2-fold axis, have only limited sequence homology except 
in the vicinity of the active site, suggesting that the 
enzymes evolved by an ancient duplication event. The 
eukaryotic pepsin-like proteases have two active site Asp 
residues with each N- and C-terminal lobe contributing 
one residue. While the fungal and mammalian pepsins are 
bilobal proteins, retropepsins function as dimers and the 
monomer resembles structure of the N- or C-terminal 
domains of eukaryotic enzyme. The active site motif (Asp- 
Thr/Ser-Gly-Ser) is conserved between the retroviral and 
eukaryotic proteases and between the N-and C-terminal of 
eukaryotic pepsin-like proteases. These endopeptidases 
specifically cleave bonds in peptides at least six residues in 
length with hydrophobic residues in both the PI and PI' 
positions. The active site is located at the groove formed 
by the two lobes, with an extended loop projecting over 
the cleft to form an 11-residue flap, which encloses sub- 
strates and inhibitors in the active site. Specificity is deter- 
mined by nearest-neighbor hydrophobic residues 
surrounding the catalytic aspartates, and by three residues 
in the flap. Nearly all known aspartyl proteases are 
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inhibited by pepstatin [26-28]. In our model, the inhibitor 
binding site was predicted by NCBI-CDD to be formed of 
residues 83, 85, 87, 123, 124, 125 and 169. NCBI-CDD 
could predict only one active site at residue 83 within a 
catalytic motif formed by residues 83-85. Additionally, ac- 
tive site flaps were predicted at residues 123-126 and 
130-133. 

ProKnow metaserver integrates outputs from PSI- 
BLAST, PROSITE, DALI/ DASEY, DIP and RIGOR to 
extract similarity of the query sequence with proteins in 
the ProKnow database. This information is subsequently 
used to assign a weighted set of functions to the query 
protein. Consensus results from ProKnow and PFP ser- 
vers suggest inhibitory effects of buVAG2 on proteolysis, 
immunological response and carbohydrate metabolism. 
BuVAG2 shows strong evidence for MHC I binding and 
down-regulation of the complement pathway. Hashi- 
zume and co-workers put forth that PAGs may act as 
immunosuppressants allowing for the immunological ac- 
ceptance of the embryo by the dam [4]; such effects may 
be accounted for in part by MHC binding and comple- 
ment inhibiting activity of PAGs. Also, a role of buPAG2 
in regulation of transcription is predicted at moderate 
confidence level, possibly through DNA dependent and 
GTP binding mechanisms. Pregnancy is a complex 
physiological process requiring adaptations by the dam 
on many fronts. While down- regulation of immune re- 
sponse is deemed essential for acceptance of the fetus as 
a hemi-allograft, down-regulation of proteolysis and 
carbohydrate metabolism may have nutritional conse- 
quences. Alternately, down-regulation of proteolysis may 
also be an essential pre-requisite for controlled apoptotic 
processes underlying fetal morphogenesis and/ or re- 
modeling of feto-maternal tissues; similar roles for PAGs 
have been postulated in bovines by Hashizume et al. [4]. 
Similarly, regulation of transcription may also be 
required for orchestration of a multitude of physiological 
processes in response to pregnancy. PFP also recognizes 
buVAG2 as an inducible, extracellular protein. Successful 
maintenance and consummation of pregnancy requires 
the dam to produce molecular signals, mainly proteins, 
which are involved in vital processes as blockage of 
PGF2a secretion and endometrial remodeling [29,30]. A 
role of PAGs in implantation and placentogenesis has 
also been proposed by Ishiwata et al. [31]. PAGs have 
also been shown to possess luteotropic activity [32,33]. 

BlastP against microbial and eukaryotic DEG entries 
did not recognize buVAG2 or an ortholog as a gene 
product that is essential for survival of an organism. 
Based on a KEGG search performed via KAAS, again, 
buVAG2 was not found to be essentially involved in any 
of the biometabolic pathways. The essentiality of an in- 
ducible gene product with sex-restricted expression in 
pregnant females is logically unlikely. 



Conclusion 

In this study, homology modeling and comparative gen- 
omics approach has been used to propose the first 3D 
structure and possible functions for bubaline Pregnancy 
associated glycoprotein 2. With the assistance of a well- 
defined structure and annotations, the functional and 
binding sites have been predicted, which will further the 
understanding of the biological roles of the protein. Our 
study predicts buPAG2 as an inducible, extra- cellular, non- 
essential, N-glycosylated, aspartic pro-endopeptidase that 
is involved in down- regulation of complement pathway 
and immunity during pregnancy. The protein is also pre- 
dicted to be involved in such down-regulation of proteoly- 
sis and carbohydrate metabolism, and regulation of 
transcription, as may be an essential pre-requisite for con- 
trolled apoptotic processes underlying fetal morphogenesis 
and re-modeling of feto-maternal tissues. These structural 
and functional insights shall allow the designing of recom- 
binant, lack-of-function proteins, and inhibitors for dissec- 
tion of the exact physiological role of the PAGs. 
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